Search CORE

15 research outputs found

Scaling Up Concurrent Analytical Workloads on Multi-Core Servers

Author: Psaroudakis Iraklis
Publication venue: Lausanne, EPFL
Publication date: 09/11/2016
Field of study

Today, an ever-increasing number of researchers, businesses, and data scientists collect and analyze massive amounts of data in database systems. The database system needs to process the resulting highly concurrent analytical workloads by exploiting modern multi-socket multi-core processor systems with non-uniform memory access (NUMA) architectures and increasing memory sizes. Conventional execution engines, however, are not designed for many cores, and neither scale nor perform efficiently on modern multi-core NUMA architectures. Firstly, their query-centric approach, where each query is optimized and evaluated independently, can result in unnecessary contention for hardware resources due to redundant work found across queries in highly concurrent workloads. Secondly, they are unaware of the non-uniform memory access costs and the underlying hardware topology, incurring unnecessarily expensive memory accesses and bandwidth saturation. In this thesis, we show how these scalability and performance impediments can be solved by exploiting sharing among concurrent queries and incorporating NUMA-aware adaptive task scheduling and data placement strategies in the execution engine. Regarding sharing, we identify and categorize state-of-the-art techniques for sharing data and work across concurrent queries at run-time into two categories: reactive sharing, which shares intermediate results across common query sub-plans, and proactive sharing, which builds a global query plan with shared operators to evaluate queries. We integrate the original research prototypes that introduce reactive and proactive sharing, perform a sensitivity analysis, and show how and when each technique benefits performance. Our most significant finding is that reactive and proactive sharing can be combined to exploit the advantages of both sharing techniques for highly concurrent analytical workloads. Regarding NUMA-awareness, we identify, implement, and compare various combinations of task scheduling and data placement strategies under a diverse set of highly concurrent analytical workloads. We develop a prototype based on a commercial main-memory column-store database system. Our most significant finding is that there is no single strategy for task scheduling and data placement that is best for all workloads. In specific, inter-socket stealing of memory-intensive tasks can hurt overall performance, and unnecessary partitioning of data across sockets involves an overhead. For this reason, we implement algorithms that adapt task scheduling and data placement to the workload at run-time. Our experiments show that both sharing and NUMA-awareness can significantly improve the performance and scalability of highly concurrent analytical workloads on modern multi-core servers. Thus, we argue that sharing and NUMA-awareness are key factors for supporting faster processing of big data analytical applications, fully exploiting the hardware resources of modern multi-core servers, and for more responsive user experience

Infoscience - École polytechnique fédérale de Lausanne

Sharing Data and Work Across Concurrent Analytical Queries

Author: Ailamaki Anastasia
Athanassoulis Manos
Psaroudakis Iraklis
Publication venue: 'VLDB Endowment'
Publication date: 01/05/2013
Field of study

Today's data deluge enables organizations to collect massive data, and analyze it with an ever-increasing number of concurrent queries. Traditional data warehouses (DW) face a challenging problem in executing this task, due to their query-centric model: each query is optimized and executed independently. This model results in high contention for resources. Thus, modern DW depart from the query-centric model to execution models involving sharing of common data and work. Our goal is to show when and how a DW should employ sharing. We evaluate experimentally two sharing methodologies, based on their original prototype systems, that exploit work sharing opportunities among concurrent queries at run-time: Simultaneous Pipelining (SP), which shares intermediate results of common sub-plans, and Global Query Plans (GQP), which build and evaluate a single query plan with shared operators. First, after a short review of sharing methodologies, we show that SP and GQP are orthogonal techniques. SP can be applied to shared operators of a GQP, reducing response times by 20%-48% in workloads with numerous common sub-plans. Second, we corroborate previous results on the negative impact of SP on performance for cases of low concurrency. We attribute this behavior to a bottleneck caused by the push-based communication model of SP. We show that pull-based communication for SP eliminates the overhead of sharing altogether for low concurrency, and scales better on multi-core machines than push-based SP, further reducing response times by 82%-86% for high concurrency. Third, we perform an experimental analysis of SP, GQP and their combination, and show when each one is beneficial. We identify a trade-off between low and high concurrency. In the former case, traditional query-centric operators with SP perform better, while in the latter case, GQP with shared operators enhanced by SP give the best results

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads

Author: Ailamaki Anastasia
May Norman
Psaroudakis Iraklis
Scheuer Tobias
Publication venue
Publication date: 28/08/2013
Field of study

Task scheduling typically employs a worker thread per hardware context to process a dynamically changing set of tasks. It is an appealing solution to exploit modern multi-core processors, as it eases parallelization and avoids unnecessary context switches and their associated costs. Naively bundling DBMS operations into tasks, however, can result in sub-optimal usage of CPU resources: highly contending transactional workloads involve blocking tasks. Moreover, analytical queries assume they can use all available resources while issuing tasks, resulting in an excessive number of tasks and an unnecessary associated scheduling overhead. In this paper, we show how to overcome these problems and exploit the performance benefits of task scheduling for main-memory DBMS. Firstly, we use application knowledge about blocking tasks to dynamically adapt the number of workers and aid the OS scheduler to saturate CPU resources. In addition, we show that analytical queries should issue a low number of tasks in cases of high concurrency, to avoid excessive synchronization, communication and scheduling costs. To achieve that, we maintain a concurrency hint, reflecting recent CPU availability, that partitionable analytical operations can use as a limit while adjusting their task granularity. We integrate our scheduler into a commercial main-memory column-store, and show that it improves the performance of mixed workloads, by up to 12.5% for analytical queries and 370% for transactional queries

Infoscience - École polytechnique fédérale de Lausanne

Reactive and Proactive Sharing Across Concurrent Analytical Queries

Author: Ailamaki Anastasia
Athanassoulis Manos
Olma Matthaios
Psaroudakis Iraklis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/04/2014
Field of study

Today an ever increasing amount of data is collected and analyzed by researchers, businesses, and scientists in data warehouses (DW). In addition to the data size, the number of users and applications querying data grows exponentially. The increasing concurrency is itself a challenge in query execution, but also introduces an opportunity favoring synergy between concurrent queries. Traditional execution engines of DW follows a query-centric approach, where each query is optimized and executed independently. On the other hand, workloads with increased concurrency have several queries with common parts of data and work, creating the opportunity for sharing among concurrent queries. Sharing can be reactive to the inherently existing sharing opportunities, or proactive by redesigning query operators to maximize the sharing opportunities. This demonstration showcases the impact of proactive and reactive sharing by comparing and integrating representative state-of-the-art techniques: Simultaneous Pipelining (SP), for reactive sharing, which shares intermediate results of common sub-plans, and Global Query Plans (GQP) for proactive sharing, which build and evaluate a single query plan with shared operators. We visually demonstrate, in an interactive interface, the behavior of both sharing approaches on top of a state-of-the-art storage engine using the original prototypes. We show that pull-based sharing for SP eliminates the serialization point imposed by the original push-based approach. Then, we compare, through a sensitivity analysis, the performance of SP and GQP. Finally, we show that SP can improve the performance of GQP for a query mix with common sub-plans

Infoscience - École polytechnique fédérale de Lausanne

Crossref

How to Stop Under-Utilization and Love Multicores

Author: Ailamaki Anastasia
Liarou Erietta
Porobic Danica
Psaroudakis Iraklis
Tözün Pinar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/02/2014
Field of study

Designing scalable transaction processing systems on modern hardware has been a challenge for almost a decade. Hardware trends oblige software to overcome three major challenges against systems scalability: (1) Exploiting the abundant thread-level parallelism provided by multicores, (2) Achieving predictively efficient execution despite the variability in communication latencies among cores on multisocket multicores, and (3) Taking advantage of the aggressive micro-architectural features. In this tutorial, we shed light on the above three challenges and survey recent proposals to alleviate them. First, we present a systematic way of eliminating scalability bottlenecks based on minimizing unbounded communication and show several techniques that apply the presented methodology to minimize bottlenecks in major components of transaction processing systems. Then, we analyze the problems that arise from the non-uniform nature of communication latencies on modern multisockets and ways to address them for systems that already scale well on multicores. Finally, we examine the sources of under-utilization within a modern processor and present insights and techniques to better exploit the micro-architectural resources of a processor by improving cache locality at the right level

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Scaling Up Concurrent Main-Memory Column-Store Scans: Towards Adaptive NUMA-aware Data and Task Placement

Author: Ailamaki Anastasia
May Norman
Psaroudakis Iraklis
Scheuer Tobias
Sellami Abdelkader
Publication venue: 'VLDB Endowment'
Publication date: 19/08/2015
Field of study

Main-memory column-stores are called to efficiently use modern non-uniform memory access (NUMA) architectures to service concurrent clients on big data. The efficient usage of NUMA architectures depends on the data placement and scheduling strategy of the column-store. Most column-stores choose a static strategy that involves partitioning all data across the NUMA architecture, and employing a stealing-based task scheduler. In this paper, we implement different strategies for data placement and task scheduling for the case of concurrent scans. We compare these strategies with an extensive sensitivity analysis. Our most significant findings include that unnecessary partitioning can hurt throughput by up to 70%, and that stealing memory-intensive tasks can hurt throughput by up to 58%. Based on our analysis, we envision a design that adapts the data placement and task scheduling strategy to the workload

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

CSR++: A Fast, Scalable, Update-Friendly Graph Data Structure

Author: Chafi Hassan
Chiadmi Dalila
Firmli Soukaina
Hong Sungpack
Lozi Jean-Pierre
Psaroudakis Iraklis
Trigonakis Vasileios
Weld Alexander
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th International Conference on Principles of Distributed Systems (OPODIS 2020)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server

Scaling up Mixed Workloads: a Battle of Data Freshness, Flexibility, and Scheduling

Author: Ailamaki Anastasia
Böhm Alexander
May Norman
Neumann Thomas
Psaroudakis Iraklis
Sattler Kai-Uwe
Wolf Florian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/09/2014
Field of study

The common "one size does not fit all" paradigm isolates transactional and analytical workloads into separate, specialized database systems. Operational data is periodically replicated to a data warehouse for analytics. Competitiveness of enterprises today, however, depends on real-time reporting on operational data, necessitating an integration of transactional and analytical processing in a single database system. The mixed workload should be able to query and modify common data in a shared schema. The database needs to provide performance guarantees for transactional workloads, and, at the same time, efficiently evaluate complex analytical queries. In this paper, we share our analysis of the performance of two main-memory databases that support mixed workloads, SAP HANA and HyPer, while evaluating the mixed workload CH-benCHmark. By examining their similarities and differences, we identify the factors that affect performance while scaling the number of concurrent transactional and analytical clients. The three main factors are (a) data freshness, i.e., how recent is the data processed by analytical queries, (b) flexibility, i.e., restricting transactional features in order to increase optimization choices and enhance performance, and (c) scheduling, i.e., how the mixed workload utilizes resources. Specifically for scheduling, we show that the absence of workload management under cases of high concurrency leads to analytical workloads overwhelming the system and severely hurting the performance of transactional workloads

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Dynamic Fine-Grained Scheduling for Energy-Efficient Main-Memory Queries

Author: Ailamaki Anastasia
Ilsche Thomas
Kissinger Thomas
Lehner Wolfgang
Liarou Erietta
Porobic Danica
Psaroudakis Iraklis
Tözün Pinar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Power and cooling costs are some of the highest costs in data centers today, which make improvement in energy efficiency crucial. Energy efficiency is also a major design point for chips that power whole ranges of computing devices. One important goal in this area is energy proportionality, arguing that the system's power consumption should be proportional to its performance. Currently, a major trend among server processors, which stems from the design of chips for mobile devices, is the inclusion of advanced power management techniques, such as dynamic voltage-frequency scaling, clock gating, and turbo modes. A lot of recent work on energy efficiency of database management systems is focused on coarse-grained power management at the granularity of multiple machines and whole queries. These techniques, however, cannot efficiently adapt to the frequently fluctuating behavior of contemporary workloads. In this paper, we argue that databases should employ a fine-grained approach by dynamically scheduling tasks using precise hardware models. These models can be produced by calibrating operators under different combinations of scheduling policies, parallelism, and memory access strategies. The models can be employed at run-time for dynamic scheduling and power management in order to improve the overall energy efficiency. We experimentally show that energy efficiency can be improved by up to 4x for fundamental memory-intensive database operations, such as scans

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Databases on modern hardware: how to stop underutilization and love multicores

Author: Ailamaki Anastasia
Jagadish H V
Liarou Erietta
Porobic Danica
Psaroudakis Iraklis
TÃ¶zÃ¼n Pınar
Publication venue: 'Morgan & Claypool Publishers LLC'
Publication date: 01/01/2017
Field of study

CERN Document Server